Feedback between stochastic gene networks and population dynamics enables cellular decision-making

Phenotypic selection occurs when genetically identical cells are subject to different reproductive abilities due to cellular noise. Such noise arises from fluctuations in reactions synthesizing proteins and plays a crucial role in how cells make decisions and respond to stress or drugs. We propose a general stochastic agent-based model for growing populations capturing the feedback between gene expression and cell division dynamics. We devise a finite state projection approach to analyze gene expression and division distributions and infer selection from single-cell data in mother machines and lineage trees. We use the theory to quantify selection in multi-stable gene expression networks and elucidate that the trade-off between phenotypic switching and selection enables robust decision-making essential for synthetic circuits and developmental lineage decisions. Using live-cell data, we demonstrate that combining theory and inference provides quantitative insights into bet-hedging–like response to DNA damage and adaptation during antibiotic exposure in Escherichia coli.


This PDF file includes:
Sections S1 to S4 Figs.S1 to S11 Tables S1 and S2 References whose dynamics are encoded in the transition matrix: where δ is the Kronecker delta, w r (x ′ , τ ) are the reaction propensities, and ν r = (ν + 1r − ν − 1r , . . ., ν + 1r − ν − 1r ) T is the reaction stoichiometry.In the following we omit the age-dependence of the elements of the matrix Q(τ ) and simply write q(x, x ′ ) = Q x,x ′ (τ ).
As done in [22,23] we derive the time-evolution of snapshot density n(x, τ, t) corresponding to the mean number of cells with molecule counts x and age τ at time t as with the boundary condition The sum is taken over the countable state space S for the trait vector x.The boundary condition corresponds to the cell divisions replacing a mother cell by m newborn daughter cells with K(x|x ′ ) denoting the probability that a daughter cell inherits x molecules from a total of x ′ molecules of the mother cell.The division rate γ(x, τ ) depends on both the cell age as well as the cell trait vector x.Letting n(x, τ, t) ∼ e λt Π(x|τ )Π(τ ), we obtain Eqs.(1).Finally m = 2 corresponds to the population lineage trees where each division results in two offspring while m = 1 corresponds to tracking a single offspring in mother machine lineages.

A. Finite State Projection method derivation
In this section we detail the construction of the finite state projection-based method for computing approximate solutions to the agent-based model.The method allows for the approximation error resulting from the truncation of the state space of the intracellular dynamics as well as error resulting from the finite time-horizon to be tracked and controlled.To that end, consider a truncated state space X ⊆ S for the trait x and let X denote the complement of X .The evolution equations that consider only the evolution of cells that remain within the truncated state space X is given by the following.
Let us denote Q X (τ )n(x, τ, t) = x ′ ∈X q(x, x ′ )n(x ′ , τ, t), ∀x ∈ X .(S7) In addition, we keep track of the number of cells that exit the truncated state space X via the following evolution equation: For brevity let us denote x∈ X q(x, x) = q(x).Note that the sum is taken over a potentially infinite complement X .However, in practice when constructing the models based on a defined stochastic reaction network we can often easily find the total rate q(x) corresponding to reactions leaving the state x ∈ X .Further, we can also compute the total rate x ′ ∈X q(x ′ , x) of transitioning from x to another state within the truncation X .We then use that q(x) = x ′ ∈X q(x ′ , x) + q(x) to evaluate the q(x) without having to sum over the infinite complement X .
To make analytical progress we consider the asymptotic behaviour of the defined system in the regime where the population grows exponentially with doubling rate λ.In particular, n(x, τ, t) ∼ e λt Π(x, τ ) and m(x, τ, t) ∼ e λt π(x, τ ) where Π and π are the snapshot densities of cells within the truncated state space and outside of it respectively.In the exponential growth regime the number of cells crossing the truncation boundary can then be given by λπ(x, τ ) = q(x)Π(x, τ ). (S9) The additional source of error that needs to be considered is the finite time interval for integrals involving cell age.Let us fix a finite time interval [0, τ max ].The proposed method reinitialises both the cells that leave the truncation X and cells that do not divide in the finite time interval [0, τ max ] according to the division kernel as newborn cells.Thus, we get the following system We then consider the conditional distribution Π(x|τ ) = Π(x,τ ) Π(τ ) .Substituting this into the joint evolution (Equation S10) and noting that the age-dependent marginal evolution is given by gives us where γ(τ ) is the marginal division rate E Π [γ(x, τ )|τ ] denoting the expectation taken with respect to Π(x|τ ).We note that then describes the density of cells that have not divided upon reaching the age τ leading to the first passage time distribution for a cell to divide from trait x at time τ , division distribution for brevity, given by Again by the definition of conditional distribution we can then derive the boundary condition for the conditional evolution as The exit probability corresponds to the cells leaving the truncation from state x ′ and not dividing within the finite time horizon [0, τ max ].Note that summing over x ′ ∈ X corresponds to the total exit probability ε for a cell to leave the state space This takes into account truncation as well as the finite time-horizon used to compute the transient evolution of Π ′ (x|τ ).Summing the boundary condition over x then gives In the solution algorithm, presented in the Methods section of the main text, the equations (S16) and (S19) are computed at every iteration step.The implementation of the solution algorithm makes use of the Julia FiniteS-tateProjection package [62] for the construction of truncated Q(τ ).Note that if ε(x, λ) = 0 for all x ∈ X then the boundary condition for the population lineage trees described in the main text by Equations ( 1) is recovered.Moreover would correspond to the Euler-Lotka equation for the population lineage trees in the case of the untruncated state space and the time horizon [0, ∞].

S2. RELATING GROWTH RATE SELECTION TO DIVISION RATE SELECTION
Here, we extend the snapshot density considered in Section S1 to cell size dynamics with growth-rate mediated selection.In particular, we assume that exponential growth of cell size ς with growth rate α(x), which continuously depends on the molecule numbers x.We arrive at the following evolution equation for the snapshot density with boundary condition The partitioning of the molecule numbers is now assumed to depend also on the proportion of the size ς ς ′ a daughter cell inherits from the mother cell of size ς ′ .The G(ς|ς ′ ) defines the probability of a cell with size ς ′ at division ending up with size ς after division.For example, for independent binomial partitioning, each molecule is partitioned into a daughter with a probability θ equal to the inherited size fraction, i.e., The inherited size fraction also defines the size division kernel: where δ is the Dirac delta function and ρ satisfies ρ(θ) = 1 2 ρ(θ) + 1 2 ρ(1 − θ) with ρ modelling the inherited size fraction accounting asymmetric cell division.The snapshot distribution in the asymptotic limit t → ∞ where the population grows exponentially with growth rate λ is then given by Integrating the division kernel over the birth sizes ς Using the law of conditional probability Π(x, ς, τ ) = Π(ς|x, τ )Π(x, τ ) and marginalising out ς, the evolution equation for the marginal snapshot distribution with boundary condition becomes We define as the marginal division rate and the effective division kernel, respectively.We then recover the model that depends only on the gene expression state x and the cell age τ : The model without cell size control is thus a reduced model of the developed cell size control model.

S3. MODEL DESCRIPTIONS A. Telegraph model
In this section we provide the complete specification of the agent-based telegraph model used in the main manuscript.The intracellular dynamics are given by the following stochastic reaction network where b ∼ Geometric( b) is a geometrically distributed random variable parametrised by the mean b.In practice, to simplify the implementation with Julia Catalyst package [63] the bursty reaction was truncated and split into multiple reactions where the rate of each reaction is weighed by the probability density p of the burst size.Note that the truncation of the bursty reaction is not necessary for the computations of q in Section S1 A. The total rate out of the each state with gene on and protein count x is the sum of k off , k deg and k prod .Thus all we need to find the rate corresponding to cells leaving the truncation is to consider the cumulative probability of bursts that do not leave the truncation.
In the case of neutral condition with no selection the division rate is given by γ(x, τ ) = g(τ ) where g(τ ) is the hazard of the Gamma distribution parametrised by mean its µ and squared coefficient of variation cv 2 .In general, if f and F are the probability density and cumulative distribution functions of a distribution respectively, then the hazard of the distribution is defined by In the selection case the division rate is given by γ(x, τ ) = s(x)g(τ ) where s(x) is a Hill function The parameters are chosen to describe the slow-switching dynamics of the promoter state to give rise to bimodal expression levels and are given in the table below.

B. Single gene feedback models
There are several models that follow the same general structure presented in the main text.We start by presenting the general structure of the agent-based and effective dilution formulations of the single gene feedback models.The specific instances of these models are then arrived to by choosing different ways to define the rate functions within the models.

Agent-based model
The intracellular dynamics of the agent-based model are given by the following general stochastic reaction network modelling production and degradation of a single protein x with rates r prod (⃗ x) and r deg (⃗ x) respectively.The general model assumes the division rate γ(x, τ ) is in the form The partitioning of mother cell protein numbers at division follows symmetric binomial partitioning.

Effective dilution model
In the effective dilution model we add a linear dilution term to the stochastic reaction network where r dil (⃗ x) is the rate of dilution modelling the loss of protein due to cell divisions in an exponentially growing population.A common approach when no division-rate selection on x is considered is to take r dil (⃗ x) = λ where λ is the doubling time of the cell population.The stationary solutions for the birth-death processes arising from the effective dilution models were computed using standard steady state methods [64].

Model instances
There are three different variants of the single protein feedback model that are considered in this paper.Each of them use the same general outline defined in Sections S3 B 1 and S3 B 2.
a. Transcriptional feedback model The transcriptional feedback model considers the protein x promoting its own production without division-rate selection on x by defining the rate of production and degradation as The degradation rate of the protein is taken to be 0 for simplicity.This model considers no division-rate selection on x and thus we take s(x) = 1.The age-dependent division-rate component g(τ ) is given as the hazard of the Gamma distribution parametrised by its mean µ and the squared coefficient of variation cv 2 .Let f and F denote the probability density and cumulative distribution functions of the Gamma distribution respectively.The division rate γ(x, τ ) is then given by The dilution rate in the effective dilution model is defined as where λ is the doubling time of an exponentially growing cell population resulting from the Gamma distributed interdivision time distribution above.The parametrisations chosen to give rise to bimodal behaviour are given in the table below.while the age-dependent component is taken to be constant g(τ ) = 1.0 corresponding to exponential division time distribution when no selection on gene expression is considered.The dilution rate in the effective dilution model is defined as The parametrisations of the model are given in the table below.
The division rate function is similar to growth feedback model (with higher Hill exponent) and is given by a Hill function dependent on the protein counts In the effective dilution model consider the corresponding dilution term The parametrisations chosen to give rise to bimodal behaviour are given in the table below.Here we provide the complete specifications of the genetic toggle switch model used in the main manuscript.

Agent-based model
The intracellular dynamics of the agent-based model are given by the following stochastic reaction network where the functions defining the rates of production and degradation for the two protein counts x A and x B are as follows: In particular, the proteins A and B inhibit each other's expression.We call α the induction strength.The division rate γ(x, τ ) for the agent-based model is then given by γ(x, τ ) = s(x)g(τ ). where In particular, we are considering positive division-rate selection on the protein A.

Effective dilution model
As previously, in the effective dilution model we add a linear dilution term to the stochastic reaction network where r dil (x) is the rate of dilution modelling the loss of protein due to cell divisions in an exponentially growing population.The dilution rate in the effective dilution model is defined as The parametrisations of the model are given in the table below.

S4. PARAMETER AND DIVISION RATE INFERENCE A. Conversion factor
The Escherichia coli datasets [36,47] analysed in the main text report fluorescence intensity of the reporter protein.In order to fit the agent-based model to data we convert the intensity to copy numbers of the protein by analysing the variability in the mother and daughter cell fluorescences at division as done, for example, in [65].To that end we find a linear relationship f = ax between the copy numbers of fluorescent molecules x and fluorescence intensity f .This conversion is done under the assumption that partitioning of the fluorescent molecules between the two daughter cells at division is binomial with probability p = 1 2 .Variance of the daughter cell fluorescence f d is then given by where x m and f m are the molecule count and fluorescence of the mother cell respectively.From the data we then fit the slope ā of Var (f d ) = āf m and equate it with a 4 to estimate the conversion factor a.

B. Bursty gene expression model
The agent-based model considered for both the DNA damage response and antibiotic resistance data sets is the same with different parametrisations.The intracellular dynamics of the agent-based model are given by the following stochastic reaction network where b ∼ Geometric( b) is a geometrically distributed random variable parametrised by the mean b (see Section S3 A for implementation details).The division rate of the model is given by γ(x, τ ) = s(x)g(τ ) where g(τ ) is the hazard of the kernel density estimate of the interdivision times in the data.If f and F denote the probability density and cumulative distribution functions of the kernel density estimate The function s(x) is fitted to individual conditions.In the cases of no antibiotic treatment and no induced DNA damage we assume there is no division-rate selection on the gene expression and thus the s(x) = 1.In the cases of antibiotic treatment and induced DNA damage we fit the parameters of a Hill function of the form The partitioning of mother cell molecule numbers at division follows symmetric binomial partitioning.The parameters fitted for the DNA damage response model is given in Table S1 and the parameters fitted for the antibiotic resistance model are given in Table S2.

C. Bayesian optimization
The Bayesian optimisation routine uses the scikit-optimize Python package with the default options [66].In particular, the optimiser uses the Matérn kernel with smoothness parameter ν = 2.5 corresponding to twice differentiable functions.The length scales of the kernel are tuned by the optimising routine by maximising the log marginal likelihood.Finally, the acquisition function is chosen probabilistically at every iteration between the lower confidence bound, negative expected improvement and negative probability improvement acquisition functions implemented in the package.We performed the parameter optimisation with 200 evaluations of the likelihood function and initialised with 10 random parametrisations.This was replicated 5 times with the optimal parametrisation across the replications chosen.The bounds used to constrain the search space are given in the table below.On short time scales the mother machine lineages and lineage tree histories agree with the qualitative dynamics of each other.However, over time the lineage tree histories settle in the fast dividing cell state.
S1. SNAPSHOT DISTRIBUTIONS IN AGENT-BASED POPULATIONSWe consider the biochemical reactions of the form

α
k0 k1 K1 K2 δ k3 K3 {16.0, 16.5, • • • , 22.0} 0.4 20.0 20.0 20.0 0.2 4.0 120.0 1.4643 N/A N/A N/A N/A DNA damage 0.2070 1.4643 1.0469 44.68 0.0 11 FIG. S1.Convergence of the mother machine lineage statistics of the telegraph model.(A-B) Birth (panel A) and division (panel B) distributions of the mother machine lineage model with fixed a truncation size and τmax converges in a couple of iterations and agrees with the agent-based simulation.(C) The exit probability due to the FSP truncation decreases with the truncation size and time horizon τmax.
FIG. S7.Agent-based stochastic simulations of lineage tree histories (red) and mother cell lineages (blue) of the genetic toggle switch for induction strength α = 18.9 (model details in SI S3 C). (A-B) Simulations trajectories starting with a cell in a slow dividing state (protein A count 5, B count 75).(C-D) Simulations trajectories starting with a cell in a fast dividing state (protein A count 35, B count 0).
kon k off k prod mean burst size b k deg Gamma(µ, cv 2 ) Hill exponent n Hill coefficient K

TABLE S1 .
The parameters fitted for the DNA damage response model via Bayesian Optimisation.

TABLE S2 .
The parameters fitted for the antibiotic resistance model via Bayesian Optimisation.